69 research outputs found

    Theory completion using inverse entailment

    Get PDF
    The main real-world applications of Inductive Logic Programming (ILP) to date involve the "Observation Predicate Learning" (OPL) assumption, in which both the examples and hypotheses define the same predicate. However, in both scientific discovery and language learning potential applications exist in which OPL does not hold. OPL is ingrained within the theory and performance testing of Machine Learning. A general ILP technique called "Theory Completion using Inverse Entailment" (TCIE) is introduced which is applicable to non-OPL applications. TCIE is based on inverse entailment and is closely allied to abductive inference. The implementation of TCIE within Progol5.0 is described. The implementation uses contra-positives in a similar way to Stickel's Prolog Technology Theorem Prover. Progol5.0 is tested on two different data-sets. The first dataset involves a grammar which translates numbers to their representation in English. The second dataset involves hypothesising the function of unknown genes within a network of metabolic pathways. On both datasets near complete recovery of performance is achieved after relearning when randomly chosen portions of background knowledge are removed. Progol5.0's running times for experiments in this paper were typically under 6 seconds on a standard laptop PC

    Learning Chomsky-like grammars for biological sequence families

    Get PDF
    This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the positive-only learning framework of CProgol. Performance is measured using both predictive accuracy and a new cost function, em Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features

    Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited

    Get PDF
    Since the late 1990s predicate invention has been under-explored within inductive logic programming due to difficulties in formulating efficient search mechanisms. However, a recent paper demonstrated that both predicate invention and the learning of recursion can be efficiently implemented for regular and context-free grammars, by way of metalogical substitutions with respect to a modified Prolog meta-interpreter which acts as the learning engine. New predicate symbols are introduced as constants representing existentially quantified higher-order variables. The approach demonstrates that predicate invention can be treated as a form of higher-order logical reasoning. In this paper we generalise the approach of meta-interpretive learning (MIL) to that of learning higher-order dyadic datalog programs. We show that with an infinite signature the higher-order dyadic datalog class H2 2 has universal Turing expressivity though H2 2 is decidable given a finite signature. Additionally we show that Knuth–Bendix ordering of the hypothesis space together with logarithmic clause bounding allows our MIL implementation MetagolD to PAC-learn minimal cardinality H2 2 definitions. This result is consistent with our experiments which indicate that MetagolD efficiently learns compact H2 2 definitions involving predicate invention for learning robotic strategies, the East–West train challenge and NELL. Additionally higher-order concepts were learned in the NELL language learning domain. The Metagol code and datasets described in this paper have been made publicly available on a website to allow reproduction of results in this paper

    Towards meta-interpretive learning of programming language semantics

    Get PDF
    We introduce a new application for inductive logic programming: learning the semantics of programming languages from example evaluations. In this short paper, we explored a simplified task in this domain using the Metagol meta-interpretive learning system. We highlighted the challenging aspects of this scenario, including abstracting over function symbols, nonterminating examples, and learning non-observed predicates, and proposed extensions to Metagol helpful for overcoming these challenges, which may prove useful in other domains.Comment: ILP 2019, to appea

    Developing a logical model of yeast metabolism

    Get PDF
    With the completion of the sequencing of genomes of increasing numbers of organisms, the focus of biology is moving to determining the role of these genes (functional genomics). To this end it is useful to view the cell as a biochemical machine: it consumes simple molecules to manufacture more complex ones by chaining together biochemical reactions into long sequences referred to as em metabolic pathways. Such metabolic pathways are not linear but often interesect to form complex networks. Genes play a fundamental role in these networks by providing the information to synthesise the enzymes that catalyse biochemical reactions. Although developing a complete model of metabolism is of fundamental importance to biology and medicine, the size and complexity of the network has proven beyond the capacity of human reasoning. This paper presents the first results of the Robot Scientist research programme that aims to automatically discover the function of genes in the metabolism of the yeast em Saccharomyces cerevisiae. Results include: (1) the first logical model of metabolism;(2) a method to predict phenotype by deductive inference; and (3) a method to infer reactions and gene function by aductive inference. We describe the em in vivo experimental set-up which will allow these em in silico predictions to be automatically tested by a laboratory robot

    Combining inductive logic programming, active learning and robotics to discover the function of genes

    Get PDF
    The paper is addressed to AI workers with an interest in biomolecular genetics and also to biomolecular geneticists interested in what AI tools may do for them. The authors are engaged in a collaborative enterprise aimed at partially automating some aspects of scientific work. These aspects include the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. As a potential component of the reasoning carried out by an "artificial scientist" this paper describes ASE-Progol, an Active Learning system which uses Inductive Logic Programming to construct hypothesised first-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. In simulated yeast growth tests ASE-Progol was used to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy of around 88% was reduced by five orders of magnitude when trials were selected by ASE-Progol rather than being sampled at random. While the naive strategy of always choosing the cheapest trial from the set of candidate trials led to lower cumulative costs than ASE-Progol, both the naive strategy and the random strategy took significantly longer to converge upon a final hypothesis than ASE-Progol. For example to reach an accuracy of 80%, ASE-Progol required 4 days while random sampling required 6 days and the naive strategy required 10 days
    • …
    corecore